HaiBiostat: Series 8 - Ordinal Response & Probit Model

Hai Nguyen

Ordinal Response

Ordinal responses are ordered; include Likert scales for agreement with attitude statements (e.g., disagree, neither agree nor disagree, and agree) and reported frequencies of doing something such as helping children with homework (e.g., daily, several times per week, occasionally, and never).

Ordinal responses are often modeled with probit regression.
According to probit approach there is a Gaussian latent variable \(z\sim N(\mu,\sigma)\) that may depend on some predictors.
Latent variable \(z\) defines the ordinal output \(y = k, \ k:\{1, ...,K\}\) through formulas

\[\begin{align*} P(y = k \mid \mu, \sigma, \theta_j) &= \phi\Big(\frac{\theta_k - \mu}{\sigma}\Big) - \phi\Big(\frac{\theta_{k-1} - \mu}{\sigma}\Big), \ &k = 2, ..., K-1 \\ P(y = k \mid \mu, \sigma, \theta_j) &= \phi\Big(\frac{\theta_1 - \mu}{\sigma}\Big), \ &k = 1 \\ P(y = k \mid \mu, \sigma, \theta_j) &= 1 - \phi\Big(\frac{\theta_{k-1} - \mu}{\sigma}\Big), \ &k = K \end{align*}\]

Single Group

Consider the simplest case when latent variable is standard normal does not have predictors, threshold parameters have uniform prior.
There is only one ordinal variable.

modelString<-"
data {
    int<lower=2> K;  // num of y classes
    int<lower=0> N;  // num of observations
    int<lower=1,upper=K> y[N];
}
parameters {
    ordered[K-1] c;   // thersholds
}
model {
    vector[K] theta;
    for (n in 1:N) {
        theta[1] = Phi(c[1]);
        for (k in 2:(K-1))
            theta[k] = Phi(c[k]) - Phi(c[k-1]);
        theta[K] = 1 - Phi(c[K-1]);
        y[n] ~ categorical(theta);
    }
}
"

Phi(T x)
standard normal cumulative distribution function of x

Note that in the description above prior is not defined.
Not defined prior is equivalent to uniform prior. Omitting definition of prior when variables are bound results in uniform prior on [0,1]. If the variable is unbound the prior is improper uniform.

Read the data.
The sample is just one column of ratings between 1 and 7.
Calculate frequencies of different levels of the rating.

mdta = read.csv(file = "data/OrdinalProbitData-1grp.csv")
head(mdta)

table(mdta$Y)


 1  2  3  4  5  6  7 
58 15 12  8  4  2  1

(frequencies <- prop.table(table(mdta$Y)))


   1    2    3    4    5    6    7 
0.58 0.15 0.12 0.08 0.04 0.02 0.01

Compile the model DSO and run Markov Chains.

model <- stan_model(model_code=modelString)
fit <- sampling(model,
                data=list(N=nrow(mdta),  # num of observations
                          K=max(mdta$Y), # num of outcome classes
                          y=mdta$Y),
                pars=c('c'),
                iter=5000, chains = 2, cores = 2)

# analyze fitted model using shinystan
launch_shinystan(fit)

Check the chains.

pairs(fit)

(fitResults<-summary(fit)$summary[,c(1,4,6,8,10)])

             mean          2.5%          50%        97.5%      Rhat
c[1]    0.1811849   -0.06299102    0.1822527    0.4300299 0.9997891
c[2]    0.6070660    0.35191827    0.6040370    0.8689709 0.9998762
c[3]    1.0480714    0.74641862    1.0463543    1.3666270 0.9998301
c[4]    1.5156082    1.16003395    1.5070613    1.9045675 0.9999335
c[5]    1.9875471    1.53136095    1.9734539    2.5301832 1.0000932
c[6]    2.6172553    1.94435433    2.5745726    3.5445753 0.9998225
lp__ -138.0311040 -142.37436655 -137.6879538 -135.5741869 0.9998792

Returned parameters are 6 thresholds separating 7 ordinal categories.

stan_ac(fit, separate_chains = T)

stan_trace(fit)

stan_dens(fit)

Compare estimated means with the frequencies of the sample

cbind(Cumulativefrequencies=head(cumsum(frequencies)),
      EstimatedProbabilities=pnorm(head(fitResults[,1])))

  Cumulativefrequencies EstimatedProbabilities
1                  0.58              0.5718888
2                  0.73              0.7280965
3                  0.85              0.8526971
4                  0.93              0.9351908
5                  0.97              0.9765691
6                  0.99              0.9955680

Plot HDI of the parameters.

plot(fit)

HDI intervals overlap which may seem counter-intuitive: thresholds have to be ordered.

Extract the chains.

fit_ext <- rstan::extract(fit)
fit_ext<-fit_ext$c
head(fit_ext)

          
iterations       [,1]      [,2]      [,3]     [,4]     [,5]     [,6]
      [1,] 0.35116666 0.7537651 0.9995782 1.288744 1.567238 2.221483
      [2,] 0.09469918 0.5023792 0.9997730 1.284909 1.948384 2.698532
      [3,] 0.06120643 0.7224987 1.0377697 1.395147 1.999715 2.478845
      [4,] 0.03202448 0.4761984 0.9188529 1.516598 1.913221 2.842915
      [5,] 0.21959980 0.5235180 1.2094977 1.838333 2.382429 2.755269
      [6,] 0.06633391 0.6574331 1.1090569 1.551895 1.889166 2.678495

Plot first 6 rows of the chains to see that at each step of the chain thresholds are ordered.

plot(fit_ext[1,],rep(1,6),ylim=c(1,6),xlim=c(0,3),col=1,pch=16)
points(fit_ext[2,],rep(2,6),col=2,pch=16)
points(fit_ext[3,],rep(3,6),col=3,pch=16)
points(fit_ext[4,],rep(4,6),col=4,pch=16)
points(fit_ext[5,],rep(5,6),col=5,pch=16)
points(fit_ext[6,],rep(6,6),col=6,pch=16)

Two Groups

Two ordinal variables may need to be compared, for example, in the following situations:

1- Two groups of participants asked to answer a questionnaire with the same scale of ordinal categories, for example, “Strongly Disagree”, “Disagree”, “Undecided”, “Agree”, “Strongly Agree”. But two groups are asked about their agreement/disagreement with two different statements about social issues: “Left-handed people should have equal rights under the law” – for group 1, and “Disabled people should be given equal rights under the law”.
The assumption is that people in both groups have the same latent variable, call it “sense of fairness”. But two social issues have different responses formalized in mean value and standard deviation of the latent variable with respect to the issues.
The goal of the experiment may be analysis of contrast between the two statements, i.e. how significant the difference is between the two statements.

2- Two groups of participants are asked to rate two different products and select level of satisfaction from 1 (lowest) to 10 (highest). Again the scale of ordinal responses is the same. Assumption is that the latent variable of evaluation of each product, “satisfaction”, is shared by both groups of participants. But different products result in different mean values and variances of the latent variable. The goal of the study is understanding the difference between the products.

Write description for 2 nominal groups.

modelString<-"
data {
    int<lower=2> K;  // num of y classes
    int<lower=0> N;  // num of observations
    int<lower=1> D;  // num of groups
    int<lower=1,upper=K> y[N]; // response
    int<lower=1,upper=D> x[N]; // group index: each obeservation has its own group
}
parameters {
    ordered[K-3] c_raw;
    vector[D] sigma; // group sigma
    vector[D] beta;  // group mean
}
transformed parameters{ // renormalize vector c
    vector[K-1] c;
    c[1] = 1.5;
    for (k in 2:(K-2))
        c[k] = 1.5 + (K - 2.0)* inv_logit(c_raw[k-1]); // sigmoid
    c[K-1] = K - 0.5;
}
model {
    vector[K] theta;
    real mu;
    real sgm;
    beta ~ normal((1+K)/2.0, K);
    sigma ~ uniform(K/1000.0, K * 10.0);
    for (n in 1:N) {
        mu = beta[x[n]];
        sgm = sigma[x[n]];
        theta[1] = Phi((c[1] - mu)/sgm);
        for (k in 2:(K-1))
            theta[k] = Phi((c[k] - mu)/sgm) - Phi((c[k-1] - mu)/sgm);
        theta[K] = 1 - Phi((c[K-1] - mu)/sgm);
        y[n] ~ categorical(theta);
    }
}"

Re-scaling of thresholds (adding a constant and dividing by a constant) is not going to change probabilities if the same transformation is done to the distribution.
In order to remove such degrees of freedom “pin down” the extreme values of thresholds to 1.5 and \(K - 0.5\) correspondingly and re-scale the rest of the thresholds between them. This is what is done in the block “transformed parameters” above.

Create DSO.

model <- stan_model(model_code=modelString)

Read the data.

mdt = read.csv(file = "data/OrdinalProbitData1.csv", stringsAsFactors=TRUE)
head(mdt)

table(mdt)

   Y
X    1  2  3  4  5
  A 31  8  4  1  0
  B 22 11  7  3  1

The dataset now has 2 groups A and B, showing different distributions on the 5 ordinal categories.

Run the chains.

fit <- sampling(model,
                data=list(N=nrow(mdt),  # num of observations
                          K=max(mdt$Y), # num of outcome classes
                          D=nlevels(mdt$X),
                          y=mdt$Y,
                          x=as.integer(mdt$X)),
                pars=c('c', 'beta', 'sigma'),
                iter=5000, chains = 2, cores = 2
)

Analyze fitted model

pairs(fit)

stan_ac(fit, separate_chains = T)

stan_trace(fit)

stan_dens(fit)

summary(fit)

$summary
                mean     se_mean        sd         2.5%         25%
c[1]       1.5000000         NaN 0.0000000    1.5000000   1.5000000
c[2]       2.5687264 0.004564235 0.2495291    2.1297057   2.3935228
c[3]       3.6650824 0.005892113 0.3162772    3.0108633   3.4515403
c[4]       4.5000000         NaN 0.0000000    4.5000000   4.5000000
beta[1]    0.4810617 0.012087812 0.5324673   -0.8054914   0.2065422
beta[2]    1.4104089 0.006548890 0.3521547    0.6278093   1.1962206
sigma[1]   1.7714314 0.013797419 0.5486677    0.9494024   1.3924604
sigma[2]   1.7536222 0.007729298 0.3849482    1.1429208   1.4888102
lp__     -96.5054790 0.051380648 2.0078407 -101.2465659 -97.5325740
                 50%         75%      97.5%    n_eff      Rhat
c[1]       1.5000000   1.5000000   1.500000      NaN       NaN
c[2]       2.5505579   2.7310103   3.088581 2988.867 0.9997306
c[3]       3.6867641   3.9003436   4.209861 2881.335 0.9996982
c[4]       4.5000000   4.5000000   4.500000      NaN       NaN
beta[1]    0.5700466   0.8559553   1.262025 1940.397 0.9999410
beta[2]    1.4373681   1.6476200   2.032807 2891.555 1.0006034
sigma[1]   1.6861781   2.0671413   3.075729 1581.332 0.9999935
sigma[2]   1.7001947   1.9638541   2.649887 2480.416 1.0005770
lp__     -96.1183970 -95.0223348 -93.752522 1527.071 0.9999967

$c_summary
, , chains = chain:1

          stats
parameter         mean        sd         2.5%         25%         50%
  c[1]       1.5000000 0.0000000    1.5000000   1.5000000   1.5000000
  c[2]       2.5656778 0.2504301    2.1163268   2.3908231   2.5493339
  c[3]       3.6688164 0.3169793    3.0266797   3.4572647   3.6870800
  c[4]       4.5000000 0.0000000    4.5000000   4.5000000   4.5000000
  beta[1]    0.4831698 0.5340802   -0.7738274   0.1930877   0.5734193
  beta[2]    1.4199808 0.3544642    0.6193359   1.2114230   1.4483583
  sigma[1]   1.7703654 0.5460767    0.9637670   1.3980432   1.6967757
  sigma[2]   1.7439336 0.3835729    1.1482349   1.4754805   1.6862058
  lp__     -96.5209197 2.0575388 -101.4332967 -97.5533587 -96.1147975
          stats
parameter         75%      97.5%
  c[1]       1.500000   1.500000
  c[2]       2.725942   3.076784
  c[3]       3.911798   4.210600
  c[4]       4.500000   4.500000
  beta[1]    0.859498   1.268055
  beta[2]    1.652155   2.050165
  sigma[1]   2.061358   3.023260
  sigma[2]   1.950170   2.643014
  lp__     -95.038429 -93.752162

, , chains = chain:2

          stats
parameter         mean        sd         2.5%         25%         50%
  c[1]       1.5000000 0.0000000    1.5000000   1.5000000   1.5000000
  c[2]       2.5717750 0.2486376    2.1467098   2.3946781   2.5513959
  c[3]       3.6613483 0.3155928    3.0004842   3.4476953   3.6846535
  c[4]       4.5000000 0.0000000    4.5000000   4.5000000   4.5000000
  beta[1]    0.4789535 0.5309480   -0.8431789   0.2261896   0.5658407
  beta[2]    1.4008371 0.3496387    0.6568121   1.1796308   1.4286961
  sigma[1]   1.7724974 0.5513536    0.9431984   1.3861733   1.6731259
  sigma[2]   1.7633108 0.3861523    1.1305726   1.4973936   1.7159548
  lp__     -96.4900383 1.9571712 -100.9751571 -97.5118612 -96.1227563
          stats
parameter          75%      97.5%
  c[1]       1.5000000   1.500000
  c[2]       2.7343485   3.099431
  c[3]       3.8882334   4.209746
  c[4]       4.5000000   4.500000
  beta[1]    0.8554803   1.243746
  beta[2]    1.6452455   2.006866
  sigma[1]   2.0728024   3.100717
  sigma[2]   1.9807715   2.657443
  lp__     -95.0109336 -93.759982

plot(fit,pars=c("beta"))

HDIs show that even at 80% the difference between the two means is insignificant.

plot(fit,pars=c("sigma"))

Standard deviations are identical.

What would the FNP approach conclude about this data?

First, assume that the data are of metric type.

Check whether t-test is able to distinguish the means.

subsetA <- subset(mdt,X=="A")
subsetB <- subset(mdt,X=="B")
t.test(subsetA$Y,subsetB$Y)


    Welch Two Sample t-test

data:  subsetA$Y and subsetB$Y
t = -2.1838, df = 77.571, p-value = 0.032
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.82551722 -0.03811914
sample estimates:
mean of x mean of y 
 1.431818  1.863636

With any level greater than 0.032 the equality hypothesis is rejected.

In the book this is explained by non-normality of the data.
Confirm this with qq-plot.

qqnorm(subsetA$Y)
qqline(subsetA$Y)

qqnorm(subsetB$Y)
qqline(subsetB$Y)

Distributions are indeed not normal. But, probably, the main reason for rejecting the null hypothesis is discrete format of the data.

Compare the samples assuming that they have multinomial distribution. Then apply \(\chi-square\) test.

chisq.test(c(table(subsetA$Y)/length(subsetA$Y),0),table(subsetB$Y)/length(subsetB$Y))


    Pearson's Chi-squared test

data:  c(table(subsetA$Y)/length(subsetA$Y), 0) and table(subsetB$Y)/length(subsetB$Y)
X-squared = 20, df = 16, p-value = 0.2202

With proper method selected the hypothesis of equality of means is not rejected.

Plot row means against simulated thresholds.

draws <- extract(fit)
c1 <- draws$c[,1]
c2 <- draws$c[,2]
c3 <- draws$c[,3]
c4 <- draws$c[,4]
mc <- rowMeans(draws$c)
head(cbind(c1,c4))

      c1  c4
[1,] 1.5 4.5
[2,] 1.5 4.5
[3,] 1.5 4.5
[4,] 1.5 4.5
[5,] 1.5 4.5
[6,] 1.5 4.5

head(mc)

[1] 3.096186 3.176928 2.952425 3.217452 3.061712 2.988291

plot(c1, mc, xlim = c(1,5),xlab="c")
points(c2,mc)
points(c3,mc)
points(c4,mc)
abline(v=mean(c2))
abline(v=mean(c3))

Probit Model with Metric Predictors

(be continued)

Single predictor: Happiness and Money

Comment on this article Share:

Series 8 - Ordinal Response & Probit Model

Ordinal Response

Single Group

Two Groups

Probit Model with Metric Predictors

Single predictor: Happiness and Money

Corrections

Reuse

Citation